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ABSTRACT 


As  the  number  and  bandwidth  of  sensors  increases,  an  acute  demand  for  preprocessing 
sensor  data  obtained  for  machine-based  decision  making  arises.  Esp)ecially  in  a  data  fusion 
context,  the  data  from  numerous  sensors  must  first  be  preprocessed  to  prevent  saturation  of 
the  decision  making  mechanism  —  albeit  man  or  machine. 

Presented  is  a  general  preprocessing  approach  which  provides  a  compact  representation 
(feature  vector)  of  sensor  data.  The  approach,  supported  by  a  signal  decomposition  theo¬ 
rem.  adaptively  models  in  recursive  fashion,  the  detrended  sensor  data  as  an  autoregressive 
(AR)  process  of  sufficiently  high  order.  Provisions  are  included  to  accommodate  nonsta¬ 
tionary  data  by  incorporating  an  information-theoretic  transition  detector  to  identify  the 
segments  of  near-stationary  data.  Together,  feature  vectors  (AR  coefficients)  are  produced 
over  near-stationary  data  segments  which  are  scale  invariant,  translation  invariant,  normal¬ 
ized,  and  represent  sufficient  statistics.  Furthermore,  the  merit  of  the  preprocessor  is  quan¬ 
titatively  determined  in  a  continuous  manner  from  the  resulting  innovations  (modeling  error 
process). 

Specific  application  results  utilizing  nonstationary  radar  data  demonstrate  the  ability  to 
simultaneously  reduce  data  and  maintain  information  content,  without  requiring  a  priori 
statistics  and/or  expert  rules. 
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ADAPTIVE  PREPROCESSING  OF 
NONSTATIONARY  SIGNALS 

1.  INTRODUCTION 


1.1  MOTIVATION 

As  the  number  and  bandwidth  of  sensors  increases,  an  acute  demand  for  preprocessing  sensor  data 
obtained  for  machine-based  decision  making  arises.  Especially  in  a  data  fusion  context,  where  the  informa¬ 
tion  from  numerous  sensors  is  combined  to  yield  high-level  decisions,  the  data  from  individual  sensors 
must  be  preprocessed  before  fusion  to  prevent  saturation  of  the  decision  making  mechanism,  albeit  man  or 
machine. 

Preprocessing  can  be  viewed  as  constructing  an  alternative  representation  of  the  data,  one  which  pro¬ 
vides  desired  invariances  and  reduces  redundancy.  In  short,  the  primary  objective  of  preprocessing  is  to 
produce  a  compact  representation  (feature  vector)  of  the  sensor  data  which  simultaneously 

•  Reduces  data  and 

•  Maintains  information  content. 

1.2  GENERAL  APPROACH 

A  general  approach  which  satisfies  the  above  preprocessing  objectives  is  driven  by  Wold's  decompo¬ 
sition  theorem  [1].  The  fundamental  theorem  basically  states  that  any  stationary  discrete-time  process  can  be 
decomposed  into  a  summation  of  a  deterministic  signal  and  an  autoregressive  ( AR)  process  of  sufficiently 
high  order  [2].  Consequently,  a  dynamic  preprocessor  following  Wold's  theorem  first  entails  detrending 
the  data  for  deterministic  quantities  and  subsequently  fitting  an  AR  model  to  the  remaining  process  as  illus¬ 
trated  in  Figure  1-1. 

Unfortunately  for  most  real  situations,  the  sensor  data  are  nonstationary.  However,  many  of  the  non¬ 
stationary  real  signals  are  piecewise  stationary.  That  is,  although  the  signal  statistics  may  vary  significantly 
over  the  complete  data  record,  localized  regions  can  be  identified  where  the  statistics  remain  constant. 
Consequently,  Wold’s  theorem  can  also  be  applied  to  nonstationary  signals  provided  the  near-stationary 
segments  can  be  identified.  Hence,  by  employing  a  statistical  transition  detector  for  identifying  the  near- 
stationary  segments,  a  general  preprocessor  architecture  for  nonstationary  signals  can  be  constructed  (see 
Figure  1-2). 
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Figure  I -I.  Preprocessing  architecture  for  stationary  signals. 
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Figure  1-2.  Preprocessing  architecture  for  nonstationary  signals.  A  feature  vector  (  /  ^ )  is  produced  over  each 
near-stationary  segment  /  ^ 


1.3  ORGANIZATION 

The  organization  of  this  report  follows  the  nonstationary  preprocessor  architecture  (Figure  1-2) 
comprising  a  dynamic  AR  modeling  module  and  transition  detector. 

In  Section  2,  an  adaptive  AR  modeling  algorithm  is  developed  to  satisfy  objectives  derived  from 
demands  posed  by  a  real  sensor  environment,  including  unknown  a  priori  statistical  information  and  fast 
throughput.  Hence,  a  recursive  algorithm  is  sought  for  dynamically  building  an  AR  model  for  each  of  the 
near-stationary  segments  of  data.  To  identify  these  segments,  in  Section  3  a  transition  detector  is  developed 
which  couples  well  with  the  previously  developed  recursive  AR  modeling  algorithm.  Both  detection  and 
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estimation  of  transition  times  are  achieved.  The  complete  set  of  algorithms  for  preprocessing  are  displayed 
in  Section  4. 

The  results  obtained  by  applying  the  developed  preprocessing  algorithms  on  real  data  are  presented  in 
Section  5,  and  issues  concerning  detreixling  and  model  order  selection  are  addressed.  A  summary  of  the 
results  is  followed  by  the  conclusion  section,  providing  direction  for  further  research. 
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2.  ADAPTIVE  AR  MODELING 


2.1  INTRODUCTION 

Motivated  by  Wold's  theorem,  consider  modeling  a  stochastic  process  with  an  autoregressive  (AR) 
model  of  order  M 

Let 

M 

>’(«)=  y(«-  0  +  <’('0 

/=i 


where 


=  y(n  -  1)  +  e(n) 


\{n  -  \)  = 


y  (n-  \)  \ 
y(n  -  2) 

y(/2  -  M  ) 


U'  = 


last  M  samples 

vCj  A 
M’ 


AR  coefficients 


(2.1) 


(2.2) 


(2.3) 


and  e{n)  represents  the  modeling  error,  often  referred  to  as  the  "innovation  process."  The  schematic  rep¬ 
resenting  the  AR  modeling  of  the  random  process  >’  («)  is  shown  in  Figure  2.1.  The  diagram  can  be 

viewed  as  a  digital  transversal  whitening  filter,  since  the  output  e{n)  should  be  white  noise  of  variance 
if  the  colored  process  V  (n)  is  truly  an  AR  (w ,  M  )  process. 


Figure  2~1.  Digital  transversal  whitening  filter  representing  the  AR  modeling  process.  The  tap  weights  m 
are  specified  a  priori. 
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The  optimal  set  of  coefficients  vv  resulting  in  an  uncorrelated  innovation  process,  are  obtained  by 
solving  a  matrix  equation  [3,4]  derived  from  the  autocorrelation  function  of  the  process  >'  in).  Assuming 
y  (n)  represents  a  stationary  segment  of  data,  then  the  autocorrelation  function  becomes  (see  Appendix  A) 


r{m)  = 

:  £{>’(//+  ni)  y* 

('0} 

r  M 

0* 

Z^W.  r 
•  -  1 

(///-/■)  +  h{0)G^ 

ni 

=  0 

/  =1 

^  M 

^  0* 
Z^w  .  r 

i  —1 

(m  -  i) 

m 

>  0 

,K-  ni) 

m 

<  0 

(2.4) 


or  equivalently  for  lags  ni  —  1,2,...M 


Rw°=  F 


(2.5) 


where 


R 


=  £■{>’(/;-  1)  ( n  -  1)}  = 


KO) 
/•(-  1) 


Kl) 

r(0) 


r(M  -  1) 
r{M  -2) 


r(- M  +  1)  /•(- M  +2)  •••  KO) 


(2.6) 


/•(-  1)  ^ 

/•(-2) 

<r{-  M)j 


(2.7) 


Thus,  solving  the  matrix  equation  Rw  °=  yields  the  AR  coefficients  iv  However,  such  batch  pro¬ 
cessing  procedure  requires  matrix  inversion  as  well  as  a  priori  knowledge  of  the  autocorrelation  function 
and  hence  is  not  suitable  for  the  current  problem,  requiring  a  recursive  algorithm  operating  in  the  absence 
of  a  priori  input  process  statistics  and  capable  of  adapting  to  changing  statistical  environments. 

2.2  OPTIMAL  LINEAR  PREDICTION  EQUIVALENCE 

The  desired  adaptive  recursive  algorithm  for  computing  the  model  coefficients  is  readily  obtained  by 
alternatively  viewing  the  task  at  hand  as  an  optimal  linear  prediction  problem  in  the  sense  of  Wiener  [5]. 

Reformulating,  consider  predicting  y  (n)  given  the  past  M  samples  y  {n  —  1) .  For  optimal  linear 
prediction,  the  output  f{n)  is  simply  a  linear  combination  of  y  {n  —  1)  expressed  as 
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(2.8) 


/(«)  =  >'(«-!). 

The  objective  is  to  choose  the  tap  weights  iv  (AR  model  coefficients)  resulting  in  the  best  performance. 
Here,  best  performance  is  defined  as  minimum  mean-squared-error  (mse). 

Expressing  the  error  between  the  desired  and  the  linear  filter  output  by  rearranging  Equation  (2.1) 

£>(/!)=  y{n)-  /(/i)^ 

the  performance  measure  (mse)  becomes 

y(H’)=  E{e{n)e*{n)} 

representing  the  average  innovation  power  or  equivalently,  the  variance  of  the  prediction  error  (AR 
modeling  error).  Now  upon  substitution 

y  (hO  =  £  {(  >’  (n)  -  yin  -  1))(  y 

-H  -  -H- 

=  r(O)  —  H’  r  —  r  H  -h 

the  performance  measure  is  seen  to  be  quadratic  in  the  weight  vector,  and  hence  yields  a  parabolic  surface 
with  a  unique  minimum. 

Consequently,  obtaining  the  optimal  tap  weights  is  achieved  by  minimizing  J  (>v  )  using  the  zero 
derivative  criteria. 


*(/i)  -  in  -  l)w  )} 

F- 

w  K  w  ^  (2.1 1) 


(2.9) 


(2.10) 


dJiw) 

dw 


=  -lF 


(2.12) 


or  Rw  °  =  r  . 

And  thus  the  equivalence  is  established  between  AR  modeling  and  optimal  linear  prediction,  for  the  optimal 
predictor  tap  weights  (impulse  response  of  the  linear  prediction  filter)  are  equivalent  to  the  AR  coefficients 
[Equation  (2.5)]. 

2.3  RECURSIVE  STEEPEST-DESCENT  AR  MODEL  FORMATION 

Returning  to  the  objective  of  obtaining  an  adaptive  recursive  algorithm  for  the  AR  coefficients,  notice 
that  under  the  optimal  prediction  interpretation  a  performance  measure  (mse)  was  defined,  guiding  the  selec¬ 
tion  of  the  coefficients.  Supplied  with  such  measure,  a  recursive  algorithm  can  now  be  formulated  follow¬ 
ing  the  method  of  steepest  descent.  The  weight  vector  (AR  model  coefficients)  at  time  n  -i-  1  is  given  by 
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=  w  in)  +  Irjir  -  Rw  in)). 


(2.13) 


win  +  1)  =  w  in)  - 


dJiw) 


w  (n) 


Figure  2-2.  Performance  measure  surface  as  a  function  of  the  tap  w  eights. 


Furthermore,  when  the  process  y  (n)  \s  stationary  and  the  statistics  R  are  known  a  priori,  the  algorithm 
converges  to  the  optimal  weights  regardless  of  the  initial  conditions,  provided  the  rate  of  adaption  is 
bounded  by 

(2.14) 

'rnax 

where  is  the  largest  eigenvalue  of  the  correlation  matrix  R  [2]. 

Although  the  steepest-descent  algorithm  [Equation  (2.13)]  is  recursive  and  guaranteed  to  converge  to 
the  desired  AR  coefficients,  the  requirement  of  the  signal  correlation  matrix  (  /?  )  prohibits  its  use  under  the 
previously  imposed  objectives.  However,  as  an  approximation  to  steepest  descent,  namely  the  least  mean 
square  (LMS)  algorithm  [6]  can  be  utilized  without  knowing  R  a  priori. 

2.4  RECURSIVE  LMS  AR  MODEL  FORMULATION 

The  formulation  of  the  LMS  algorithm  follows  by  replacing  the  statistical  quantities  R  and  r”  in 
Equation  (2.13)  by  the  instantaneous  estimates.  That  is,  substitute  y{n-\)y^{n  -1)  and 
/(n  -  l)y  *(n  )  for  /?  and  r~  ,  respectively,  in  Equation  (2.13)  to  yield 

w’(/i  +  1)  =  win)  +  2ri(^  y  in  -  1)  y  *(«)  -  v  («  -  1)  in  -  l)vv’(r7)) 
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=  w(n)  +  Iriyin  -!)(>’(«)-  h’^  (n)  y  (n  -  1)) 

=  w(n)  +  2T]y(n  -  l)e*in).  ^2.15; 

The  LMS  algorithm  is  often  initialized  with  vv  (0)  =  0  and  is  diagramed  in  Figure  2-3.  In  compari¬ 
son  with  Figure  2-1,  this  recursive  algorithm  results  in  an  adaptive  transversal  whitening  filter  which  is  time 
varying  and  nonlinear. 


Figure  2-3.  Adaptive  digital  transversal  w  hitening  filter  representing  an  adaptive  AR  modeling  process.  Here  the  tap 
weights  w  {n  )  are  determined  adaptively. 


Further  examination  of  the  algorithm  reveals  the  weights  at  time  n  +  7  to  be  explicitly  dependent  up>on  the 
last  Af  +  7  values  of  the  random  input  process  [through  the  product  of  >"(n  -  1)  and  the  innovations 
e{n)]  and  implicitly  dependent  upon  all  past  values  of  the  random  input  process  [through  the  previous 
weight  vector  vT  {n  )].  Hence,  the  memory  of  the  filter  is  characterized  by  the  choice  of  the  adaption 
(learning)  parameter  T]  ;  for  small  7]  the  filter  memory  is  long,  wherein  the  dependence  is  primarily  implicit 
amongst  all  past  values  of  y  (n  ),  and  for  large  ^  the  filter  memory  is  short,  wherein  the  dependence  is 
primarily  explicit  amongst  the  past  M  1  samples  of  y  ( /t )  (see  Figure  2-4). 
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y{n  +  1) 


Figure  2-4.  Dependence  of  the  weight  adaptions  upon  the  input  data. 


Considering  the  LMS  convergence  properties,  clearly  by  employing  the  instantaneous  estimates, 
the  weights  are  expected  to  fluctuate  during  the  iterative  process.  However,  under  certain  indep)endence 
assumptions  [7],  or  under  specific  statistical  correlation  prop)erties  of  the  signal  [8],  both  the  ensemble 
average  of  the  LMS  weight  vector  and  the  ensemble  average  of  the  mse  converge  to  the  optimal  values, 
provided  the  rate  of  adaption  is  bounded  by 


M 

0<  ri  <  1/Xa. 

i  =1 


(2.16) 


where  )  are  the  eigenvalues  of  the  correlation  matrix.  Moreover,  this  criterion  [Equation  (2.16)]  is 
simplified  by  observing  for  a  stationary  y  ( w  )  process 

M 

MriO)  =  ir[^]  =  tr[QQ^  r'\=  tr[Q^  ^q]=  trVA]  =  X  A .  (2.17) 

I  =1 


where  Q  is  the  matrix  of  eigenvectors  for  R  and  A  is  the  diagonal  matrix  containing  the  eigenvalues. 
Thus  the  convergence  in  mean  and  mean  square  of  the  AR  coefficients,  when  j  (n  )  is  a  stationary  process 
conforming  to  the  various  assumptions,  is  guaranteed  provided 


0<  T]  < 


I  _ 
Mr(0)' 


(2.18) 
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2.5  SUMMARY  OF  ADAPTIVE  AR  MODELING  ALGORITHM 


The  LMS  algorithm  suited  for  recursively  determining  the  AR  model  coefficients  describing  a 
stochastic  process  >’  (n  )  in  the  absence  of  complete  a  priori  statistics  is  summarized  below. 

A  Priori  Parameters 
M  =  AR  model  order 

T]  =  adaption  (learning)  parameter  ^  V  ^  A/ /-(O)) 

Initial  Conditions 

vv  (0)  =  0 
y  (0)  =0 

Innovations 

>’(«)-  (n)y in  -  1) 

M 

=  yin)  -  *(  n)  yin  -  i) 

i=l 

AR  Coefficients 

vv(n  +  1)=  win)  +  2rj  Jin  —  l)e*(/i) 
or  w.in  +  1)  =  M’.( «)  +  2r\  yin  -  i)e*in)_ 


(2.19) 

(2.20) 

(2.21) 

(2.22) 

(2.23) 


(2.24) 

(2.25) 
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3.  TRANSITION  DETECTION 


3.1  OBJECTIVE 

Recall  that  the  central  objective  of  preprocessing  a  nonstationary  signal  is  to  produce  feature  vectors 
over  near-stationary  data  segments.  These  feature  vectors,  subsequently  used  for  signal  classification, 
ideally  provide  data  reduction  while  maintaining  information  content. 

Consequently,  identifying  the  near-stationary  intervals  comprising  the  nonstationary  signal  is  of  prime 
importance  in  feature  vector  extraction.  And  since  the  boundaries  of  the  near-stationary  segments  coincide 
with  a  change  in  the  spectral  characteristics  of  the  signal,  a  spectral  transition  detection  algorithm  is 
warranted. 

In  addition  to  detecting  a  change  in  spectral  character,  an  algorithm  is  sought  which  also  estimates  the 
actual  transition  occurrence  (T^)  with  short  delay  -  T ^  and  few  false  alarms  (see  Figure  3-1). 


Figure  3-1 .  Piecewise  stationary  signal  with  spectral  transition  at  T ^  and  subsequent  detection  at  Tj  . 


Furthermore,  in  harmony  with  the  previously  mentioned  AR  modeling  objectives,  the  adaptive  algorithm  is 
desired  recursive  and  operational  in  environments  where  the  signal  statistics  are  unknown  both  before  and 
after  transition. 

A  candidate  algorithm  for  transition  detection  is  presented  which  couples  nicely  to  the  previously  de¬ 
veloped  recursive  AR  modeling  algorithm.  The  algorithm  discussed  is  a  dual  window  approach  which 
counters  many  limitations  of  the  classical  single  window  technique  that  essentially  tests  how  far  from  the 
white  noise  hypothesis  are  the  innovations  arising  from  the  modeling. 


13 


3.2 


DUAL  WINDOW  APPROACH 


To  counter  the  limitations  of  the  classical  single  window  approach  (including  large  variances  before 
transition,  unpredictable  behavior  following  transitions  where  a  decrease  in  signal  energy  occurs,  and  re¬ 
quirement  of  a  priori  reference  information),  a  technique  utilizing  both  global  and  local  windows  is  adopted. 
The  approach  introduced  by  Basseville  and  Benveniste  [9]  offers  better  behavior  before  transitions  and 
yields  larger  drifts  in  the  test  statistic  following  detection,  thereby  improving  detection  capability. 

The  approach,  like  numerous  other  transition  detection  algorithms,  employs  a  cumulative  sum  statistic 
of  the  form 

n 

«('!)=  ^ik)  (3.1) 

k  =  \ 

Ir  (k  )  to  be  determined]  whose  drift  properties  signal  a  change  in  spectral  characteristics.  The  input  driv¬ 
ing  the  statistic  is  the  modeling  error  or  innovation  process,  as  diagramed  in  Figure  3-2. 


Figure  3-2.  General  transition  detector  based  on  cumulative  sum  statistic. 


Basically,  the  integration  effect  provided  by  a  cumulative  sum  statistic  provides  more  reliable  detection  capa¬ 
bility  in  noisy  environments  (i.e.,  where  the  true  AR  model  coefficients  are  unknown).  Thus,  instead  of 
detecting  a  change  in  absolute  mean  of  the  innovation  process  at  a  transition,  rather  a  more  sensitive  change 
in  drift  of  u  {n)  is  detected  as  shown  in  Figure  3-3. 

The  distinctive  feature  of  this  approach,  however,  is  the  utilization  of  two  windows.  The  global 
reference  window  expands  during  the  process  allowing  all  information  in  the  stationary  segment  under  the 
distribution  Pa  to  be  included  in  the  AR  model  building.  Utilizing  all  information  enables  better  modeling 
(estimation  of  the  AR  coefficients)  of  v  (n)  before  the  transition.  In  contrast,  the  fixed  local  window  uses 
only  the  most  recent  information.  When  utilizing  a  dual  window  transition  detection  approach  in  conjunction 
with  recursive  AR  modeling,  the  global  and  local  windows  can  be  effectively  implemented  by  simply 
choosing  appropriate  learning  parameters.  For  example,  when  using  two  LMS  filters  (per  Section  2.4)  for 
recursively  estimating  the  local  and  global  AR  coefficients  simultaneously,  the  learning  parameter  for  the 
global  filter  (7^ )  is  chosen  smaller  than  the  learning  parameter  for  the  local  filter  )  within  the  constraint 
[Equation  (2.18)].  Consequently,  the  global  filter  with  long  memory  (small  )  coincides  with  the 
expanding  global  window,  while  the  local  filter  with  short  memory  (large  7]^  )  corresponds  to  the  fixed  local 
window. 
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Figure  3-3.  Properties  of  the  ia)  signal,  ih)  innovation,  and  (c)  cumulative  sum  statistic  at  a  spectral  transition 
(true  modeling  coefficients  are  unknown). 


Now  as  the  region  of  transition  is  approached  [see  Figure  3-4  (b)  and  (c)],  the  information  theoretic 
distance  metric  between  the  distribution  laws  Pa  and  extracted  from  the  global  and  local  windows  in¬ 
creases,  since  the  global  reference  model  with  long  memory  is  virtually  unaffected  by  the  most  recent  infor¬ 
mation.  Once  the  distance  measure  exceeds  a  given  threshold,  the  transition  is  detected  and  the  reference 
window  reinitialized  [see  Figure  3-4  (d)]. 

The  particular  distance  measure  used  to  gauge  the  discrepancy  between  the  adaptive  filters  modeling 
the  signal  based  upon  complete  (global  window)  and  partial  (local  window)  information  is  given  by 

-1))-  1))}  ,32 


where 


t P  lp{y^  ^  )/  >'  ( ^'  -  D)  =  log  Pgi  ~ 


pAy{k)ly{k  -  1)) 


(3.3) 


£p.(/  ,  (.v(*)/y(A'  -l))l.  =  J;j^(y/y(*-  -l))log  pj.yly(k  -D)  dy 

P^{y/yik-\)) 


(3.4) 


(conditional  Kullback’s  information  [10]) 
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(a) 


INITIALIZATION 


SPECTRAL  TRANSITION  (MANEUVER) 


y(n)  Pa 


TRANSITION  DETECT1£)N/FEATURE 
VECTOR  PRODUCED  V(l|() 


(d) 


REINITIALIZATION 


Figure  3-4.  Dual  window  transition  detector  operation. 


resulting  in  the  cumulative  sum  statistic  given  by  Equation  (3.1).  Notice  from  Equation  (3.2)  that  the  mea¬ 
sure  is  seen  to  be  a  difference  in  the  average  and  instantaneous  discrepancies  in  the  distributions  representing 
the  data  from  the  global  and  local  windows. 

The  desirable  properties  of  this  test  statistic  are  revealed  in  part  by  examining  the  drifts  in  the  condi¬ 
tional  mean  value  before  and  after  transition.  Before  transition 

=  Ep{u(.n)  -  u(n  -  1)}  =  E^JJ  (n)}  =  0, 


while  after  the  transition 


D  =  E  {u{n)  -  uin  -  1)} 
b  Pi, 


(3.6) 
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=  \\_Pb^y ! y ~  ~  Pck  y ! y  ~  PgC  v /  v(^  -  D)  dy 

p^(  y/yin-  1)) 

(conditional  Kullback  s  divergence) 


<0. 


Thus,  zero  conditional  drift  in  m  (a?  )  occurs  before  the  transition,  while  a  negative  drift  occurs  after  the 
transition.  Consequently,  the  detection  of  a  statistical  transition  in  the  process  y  (n  )  can  be  accomplished 
by  detecting  a  nonzero  drift  in  m  (a?  ) . 


3.3  TRANSITION  DETECTION  ALGORITHM 


Examining  Equation  (3.2),  obtaining  an  explicit  algorithm  for  the  cumulative  sum  test  statistic  requires 
knowledge  of  the  functional  form  of  the  distributions.  Now  if  the  input  process  is  assumed  jointly  Gauss¬ 
ian,  the  resulting  transition  detection  algorithm  couples  nicely  with  the  previous  recursive  AR  modeling  al¬ 
gorithm. 

In  particular,  once  the  recursive  AR  modeling  coefficients  have  converged  to  the  optim2d  values 
(equivalently,  the  adaptive  filter’s  impulse  response  has  converged  to  the  optimal  tap  weights; 
vv  — ^  vv  ^  /?vv  °=  /”  and  <T^  =  /*  (0)  -  vv  ^  /”  ),  the  cumulative  sum  test  statistic  for  detecting  statistical 
transitions  is  given  by  Equation  (3. 1 )  with  (see  Appendices  B  and  C). 


-f 

1 

1  t 

CN 

^  ; 

K(*)-  <■<,(*)[  ■ 

1 

% 

- 1 

(3.7) 


The  associated  schematic  representing  the  algorithm  Equation  (3.7)  is  shown  in  Figure  3-5.  Notice 
the  statistic  represents  a  normalized  distance  measure  between  the  innovation  processes  derived  from  the  lo¬ 
cal  and  global  data  windows.  Specifically,  the  statistic  is  driven  by  both  the  squared  difference  and  the  dif¬ 
ference  in  squares  of  the  respective  innovation  processes. 

3.4  TRANSITION  TIME  ESTIMATION  ALGORITHM 

Following  detection  of  a  spectral  transition  utilizing  the  cumulative  sum  statistic,  the  actual  time  of 
transition  must  be  estimated.  The  basic  estimation  task  is  illustrated  in  Figure  3-6,  where  the  objective  is  to 
provide  an  estimate  of  the  transition  time  with  minimal  delay  {T^  -  T^)  following  detection. 

An  optimal  approach  developed  by  Hinkley  [  1 1  ]  for  minimizing  the  detection  delay  time  (7j  “  )  , 

assuming  a  fixed  average  time  between  false  alarms,  involves  assigning  a  drift  bias  5  to  the  test  statistic. 
Thus,  instead  of  estimating  the  point  of  departure  from  zero  drift  as  depicted  in  Figure  3-6,  the  transition 
estimate  is  simply  the  time  where  the  biased  cumulative  sum  statistic  attains  minimal  value  in  the 
neighborhood  of  the  alarm  per  Figure  3-7. 
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Figure  i-5.  Transition  detection  algorithm  architecture. 
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Figure  3-6.  Cumulative  sum  statistic  behavior  at  a  transition. 
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Figure  3-7,  Biased  cumulative  sum  statistic  behavior  at  a  transition. 
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Consequently,  the  complete  transition  detection  and  estimation  algorithm  is  given  by 


Statistics 

n 

u{n)  =  (k)  -  6 

k  =  \ 


(3.8) 


T  (k)  = 


1  1 
2 


(3.9) 


Detection  Rule 


m( n)  -  m{n)  >  h 
<  h 


^  detection 
=>  no  detection 


(3.10) 


where 


m(n)=  min  u(k) 
0<  k  <  n 


Estimate 

T^  =  3  u^ttg)  =  min  t<(  k) 

k<T,. 


(3.11) 


(3.12) 
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4.  SUMMARY  OF  PREPROCESSING  ALGORITHMS 


Combining  the  results  from  the  adaptive  recursive  AR  modeling  and  the  dual  window  transition 
detection  results  in  the  following  set  of  algorithms  for  preprocessing  nonstationary  data. 


ADAPTIVE  AR  MODELING 


Global  Model 

(long  memory) 


Local  Model 

(short  memory) 

A  Priori  Parameters 


AR  model  order 
adaption  (learning)  parameter 

(0  <  -?,  <  n,) 


Initial  Conditions 


^r^(o)  =  o 

v(0)  =  0 


w^{0)=0  (4.1a.b) 

y(0)  =  0  (4.2a.b) 


Innovations 


M.. 


ejin)=  yin)-  "^w^in)  yin  -  i) 


,(n)=  yin)-  "^\\^*(n)y(n  -  i) 


i  =1 


/=1 


AR  Coefficients 


(4.3a.b) 


w  .in  +  1)  =M'  .in)  + 

I  I 

•I  u 

-  ‘ay» 


w  .in  +  1)  =vi’ .  ( /?)  4- 
/  ; 


(4.4a.b) 


21 


TRANSITION  DETECTION 


A  Priori  Parameters 


s 

h 


drift  bias 
threshold 


(empirically  determined) 


Initial  Conditions 

m(0)  =  0 


(4.5) 


Cumulative  Sum  Statistic 


T(n)= 


+ 

1 

1 

\ea(n)f 

e^in)-  e^in) 

2 

or 

cl 

L  b 

h  ^ 

b 

u(n) 

=  u(n  -  1)  +  Tin)- 

8 

(4.6) 


(4.7) 


Detection  Rule 

u(n)-  m{n)>  h  =>  detect  transition  (7^  j)  (4.8) 

<  h  no  transition 

where 

m(n)  =  min  u(k) 

0  <  k  <  n 

Transition  Time  Estimate 

f^=  u(n^)=Tmnuik) 
k<T^. 

In  addition,  if  the  variances  of  the  respective  innovation  processes  are  unknown,  the  unbiased  sample  vari¬ 
ance  estimates  below  can  be  substituted  in  Equation  (4.6). 
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(4.10) 


where 


TTTT  )  “  ^("  )] 

k=\ 


n 

^('0  = 


;•  =  ! 


(4.11) 


or  the  recursive  forms  may  be  utilized 


&An)=  dAn  -  1)  +  [e  (// )  -  e  (//  -  1)]^  + 


n  -  1 


-  e  (/?)]' 


(4.12) 


where 

e  (n)  =  ^^-e{n  -  \)  +  (4.13) 

The  schematic  representing  the  collective  preprocessing  algorithms  is  shown  in  Figure  4-1.  As  a  final 
comment,  the  resulting  feature  vectors  computed  over  the  near-stationary  intervals  possess  several  properties 
instrumental  in  contributing  to  good  classification  performance.  Regarding  invariances,  scale  invariance  is 
easily  demonstrated  by  scaling  both  sides  of  Equation  (2.1)  by  a  constant,  and  observing  the  new  process 
r  (/?  )  =  )  yields  the  same  AR  coefficients.  Also  from  the  same  equation,  translation  invariance  is 

easily  demonstrated  by  forming  a  new  process  r  (/i  )  =  v  {n  -  k  )  and  again  observing  that  the  coefficients 
are  identical . 

Furthermore,  the  characteristic  equation  (1  -  "  =0)  representing 

an  AR  asymptotic  stationary  (physical)  process  must  have  roots  bounded  in  norm  by  unity.  Hence,  the  AR 
coefficients  themselves  are  bounded,  although  not  necessarily  by  unity.  Yet  for  many  applications, 
including  the  present,  the  bound  for  the  AR  coefficients  is  empirically  observed  to  be  unity. 

Consequently,  in  addition  to  the  compact  representation  being  a  sufficient  characterization  of  the 
stochastic  process,  the  feature  vectors  inherently  instill  a  degree  of  robustness  due  to  the  invariance 
properties. 
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5.  RESULTS 


Recall  that  the  objective  of  preprocessing  nonstationary  signals  is  to  produce  a  compact  representation 
(feature  vector)  of  the  data  which  simultaneously  reduces  data  and  maintains  information.  More  specifically, 
guided  by  the  decomposition  theorem  the  data  are  first  detrended  then  segmented  into  near-stationary  inter¬ 
vals.  Next,  feature  vectors  containing  the  AR  modeling  coefficients  are  adaptively  computed  over  each  in¬ 
terval. 


An  illustration  of  this  adaptive  preprocessing  technique  is  shown  in  Figure  5-1.  As  the  first  transition 
in  spectral  characteristics  is  encountered  (at  time  7^ ),  the  cumulative  sum  statistic  iuin  ))displays  a  nonzero 
drift.  Upon  reaching  a  threshold  (r  )  at  time  7^  ,  the  transition  time  is  estimated  (7 1)  and  the  AR  modeling 
coefficients  determined  over  the  now -defined  near-stationary  segment  /| .  The  elements  comprising  the 
feature  vector  K  (/ 1 )  are  computed  by  simply  averaging  the  instantaneous  AR  coefficients  obtained  from 
the  adaptive  global  AR  modeling  filter  over  the  segment  /j .  The  process  is  then  reinitialized  at  (f^) ,  and 
the  subsequent  feature  vector  V  {1 2)  o^^r  the  next  interval  1 2  (defined  by  the  transition  detector)  is 
computed  similarly.  The  net  result  is  a  collection  of  feature  vectors,  each  obtained  over  a  near-stationary 
interval  which  can  be  utilized  to  characterize  the  nonstationary  signal.  Results  from  applying  this  general 
preprocessing  technique  to  real  nonstationary  data  follow.  The  nonslationary  signal  utilized  is  a  radar  cross 
section  (RCS)  versus  time  record  obtained  from  a  radar  observing  maneuvers  from  an  object  of  interest. 

The  spectral  transitions  observed  in  the  data  are  physically  produced  by  the  object  undergoing  specific 
maneuvers.  Hence,  the  applied  objective  of  preprocessing  is  to  automatically  produce  disparate  feature 
vectors  representing  each  of  the  maneuvers,  which  can  then  be  used  to  drive  an  appropriate  classification 
algorithm,  thereby  automatically  detecting  and  classifying  object  maneuvers. 

First,  the  detrending  procedures  necessary  for  the  specific  signals  utilized  are  described.  Next,  the 
model  order  for  the  adaptive  AR  modeling  filters  is  determined.  Now  with  a  fixed  model  order,  transition 
detection  and  feature  vector  extraction  results  are  presented.  Finally,  a  summary  of  the  preprocessing  per¬ 
formance  is  presented. 

S.l  DETRENDING 

The  RCS  data  employed  are  shown  in  Figure  5-2.  An  experienced  radar  analyst  categorizes  the  data 
into  the  maneuvers  pitch,  roll,  and  yaw  (each  separated  by  a  stable  region).  Such  categorization  serves  as 
"truth"  and  is  used  later  for  evaluating  the  performance  of  the  transition  (maneuver)  detector.  Detrending 
begins  by  subtracting  out  the  overall  mean,  physically  corresponding  to  the  mean  RCS  of  the  object  which  is 
typically  known  a  priori.  Next,  the  data  are  high-pass  filtered  to  remove  low-frequency  trends  which  persist 
throughout  the  entire  data  record.  The  filtering  is  achieved  by  forming  a  new  detrended  sequence 

y{n)=  y\n)  -  y^p{n)  (5.1) 

where 

v^p(/i)=  Oiy^pin-  1)  +  (1-  (5.2) 
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Figure  5- 1 .  General  adaptive  preprocessinii  procedures  (a)  transition  detected  at  time  Tj  \  (h)  transition  time  estimated 
(v)  AR  coelfn  ients  computed  |V'  )|;  ami  reinitialization 


DETRENDED  RCS  (dB)  RCS  (dB) 


Figure  5-2.  Original  and  detrended  radar  data. 
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is  a  unity  DC-gain  low-pass  version  of  the  zero  mean  RCS  data  y'(n  ) .  Typically  with  such  data,  the  low- 
frequency  components  common  through  the  record  can  be  removed  by  modest  filtering  (a  =  0.9999 ). 
Consequently,  a  zero  mean,  high-pass,  detrended  sequence  results  (as  shown  in  Figure  5-2),  and  that  drives 
the  subsequent  AR  modeling  filter  and  transition  detector. 

5.2  MODEL  ORDER 

Before  the  adaptive  modeling  filter  can  be  driven  by  the  detrended  time  series,  the  model  order 
(equivalently,  the  number  of  tap  weights  for  the  adaptive  transversal  filter)  must  be  determined.  Ideally,  the 
order  is  chosen  to  provide  parsimonious  modeling  [12].  That  is,  a  model  order  is  sought  which  provides  an 
acceptable  compromise  between  model  performance  and  complexity. 

Two  conventional  criteria  for  model  order  selection  were  investigated.  The  AIC  criterion,  both  theo¬ 
retically  intuitive  and  practically  effective,  minimizes  the  distance  between  the  true  and  observed  distribu¬ 
tions.  For  the  AR  formulation,  AIC  is  given  by  [13] 

AIC{Q)  =  N  \n  c^  +  2Q  (5.3) 

where 

Q  =  AR  model  order  (5.4) 

N  =  number  of  samples  (5.5) 

(j^=  modeling  error  variance.  (5.6) 

Notice  the  first  term  in  Equation  (5.3)  serves  to  penalize  poor  modeling,  while  the  second  serves  increased 
complexity  (through  the  number  of  tap  weights  Q).  The  AIC  optimal  model  order  is  simply  the  order  that 
minimizes  Equation  (5.3).  Although  intuitive  and  often  effective,  the  AIC  optimal  model  order  is  not 
consistent,  and  hence  does  not  necessarily  converge  to  the  true  model  order. 

A  consistent  model  order  estimator  investigated  is  the  minimum  description  length  (MDL)  given  by 
[14] 


MDL(Q)  =  ^In  (t;  +  yin  A' 


(5.7) 


This  criterion  minimizes  the  number  of  digits  necessary  to  encode  N  observations  and  often  results  in  a 
lower  model  order  than  AIC.  Similarly,  MDL  optimal  order  is  chosen  to  minimize  Equation  (5.7). 

Both  techniques  for  determining  model  order  yielded  identical  optimal  order.  The  similarity  can  be  at¬ 
tributed  to  the  relatively  large  number  of  data  samples  {N  =  8000 ) .  Observing  Equations  (5.3)  and  (5.7). 
for  large  N  the  first  terms  dominate,  causing  AIC  and  MDL  values  to  vary  by  only  a  constant  factor. 
Consequently,  minimal  values  denoting  optimal  order  were  achieved  at  identical  orders.  The  results  are 
shown  in  Figure  5-3  along  with  the  AIC  and  MDL  optimal  model  orders  for  each  maneuver. 
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Figure  5-i.  A/C  model  order  results. 


Unfortunately,  the  optimal  model  orders  vary  for  each  maneuver.  Therefore,  to  achieve  constant 
model  order  throughout  all  maneuvers  (constant  model  order  is  desired  for  fast  automatic  preprocessor 
operation  without  a  priori  maneuver  knowledge),  an  alternate  technique  for  determining  the  "best"  model 
order  is  required.  Specifically,  the  best  model  order  chosen  represents  the  lowest  order  model  which  pro¬ 
vides  good  discrimination  amongst  the  feature  vectors  ( AR  coefficients)  for  the  various  maneuvers.  From 
Table  5-1,  the  best  model  order  for  the  RCS  data  is  seen  to  be  4,  since  little  disparity  amongst  the  feature 
vectors  exists  for  lower  orders.  Note  that  M  =  4  is  also  the  average  model  order  amongst  the  optimal  orders 
selected  by  AIC  and  MDL  for  the  maneuvers. 


29 


TABLE  5-1 

AR  Cofficients  for  Varying  Model  Order 

Model 

Order 

STABLE 

PITCH 

ROLL 

YAW 

1 

0 

0.93 

0.91 

0.79 

2 

0 

0.61 

0.57 

0.47 

0 

0.29 

0.39 

0.37 

3 

0 

0.61 

0.41 

0.37 

0 

0.30 

0.25 

0.24 

0 

0.04 

0.22 

0.25 

4 

0 

0.58 

0.39 

0.32 

0 

0.32 

0.23 

0.19 

0 

0.04 

0.14 

0.17 

0 

-0.06 

0.06 

0.21 

5.3  PREPROCESSOR  RESULTS 

With  the  RCS  data  detrended  and  model  order  now  selected,  transition  detection  and  feature  vector 
extraction  can  be  performed.  Expanded  results  are  presented  in  the  region  of  each  transition,  thereby 
enabling  both  evaluation  of  the  transition  detector  and  the  production  of  the  feature  vectors.  For  each 
transition,  a  portion  of  the  detrended  signal  is  shown  with  two  cumulative  sum  statistics.  The  first,  , 
follows  from  Equation  (3.8)  and  exhibits  the  desired  drift  behavior  for  transitions  from  low  to  high  vari¬ 
ance.  The  second  statistic  is  computed  in  parallel  by  simply  interchanging  the  roles  of  rj^  and  7]^ 
in  Equation  (4.4),  offering  better  drift  behavior  for  high-  to  low-variance  transitions.  A  threshold  of 
r  =  100  was  used  throughout  for  both  statistics.  On  subsequent  graphs  for  each  transition,  the  instantan¬ 
eous  AR  coefficients  (tap  weights  of  the  adaptive  transversal  filters)  for  both  the  global  and  local  modeling 
filters  with  accompanying  model  error  (innovations)  are  shown.  Averaging  the  AR  coefficients  over  the 
detected  intervals  constitutes  the  corresponding  feature  vector. 

Stable-Pitch  Transition:  Results  are  shown  in  Figures  5-4  to  5-6.  The  transition  occurs  at  n  =  500  and 
is  detected  (  -  exceeds  threshold)  near  600.  Subsequent  estimation  of  the  transition  location 
[initial  point  of  positive  drift  in  -  ,  or  equivalently,  the  local  minimum  of  Equation  (3.8)]  is  within 

samples  of  the  actual  location.  The  feature  vector  representing  the  stable  region  is  computed  by  averaging 
the  instantaneous  AR  global  model  coefficients  over  the  first  interval  detected  (0  -  /^),  yielding  approx¬ 
imately  (/j)  =  (0,  0, 0, 0)  •  Hence  the  stable  region  is  modeled  as  white  noise. 
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Figure  5-4.  Stable-pitch  transition  detection  results. 
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Figure  5-5.  Stable-pitch  global  modeling  results. 
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Figure  5-6  Stable -pitch  local  modeling  results. 
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Pitch-Stable  Transition:  Results  are  shown  in  Figures  5-7  to  5-12.  The  transition  occurs  at  n  =  1276, 
and  the  appropriate  test  statistic  (for  high-  to  low-variance  transitions)  fails  to  surpass  threshold,  thereby 
being  incapable  of  detecting  the  transition  within  acceptable  delay  time.  However,  by  examining  the  be¬ 
havior  of  ,  modifications  can  be  employed  to  ensure  detection. 
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Figure  5-7.  Pitch-stable  transition  detection  results. 
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Figure  5-8.  Pitch-stable  global  modeling  results. 
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Figure  5-9.  Pilch-stable  local  modeling  results. 
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The  behavior  observed  is  due  to  the  test  statistic  u  {n  )  having  an  asymmetrical  unconditional  drift. 
That  is,  the  drift  in  «  (n  )  experienced  by  a  signal  transition  from  a  distribution  to  is  not  equivalent 
in  magnitude  to  the  drift  incurred  from  the  transition  P^  to  P^  .  Specifically,  for  the  signal  analyzed  (see 
Figure  5-7),  a  transition  from  high  to  low  variance  (signal  power)  yields  slow  linear  behavior  in  fol¬ 
lowing  the  transition. 

The  linear  property  follows  from  the  post-transition  signal  (having  low  variance)  being  relatively 
small  in  magnitude.  From  Equation  (4.3),  small  y  (n)  yields  small  innovations  e{n) ,  since  typically 
I H’,- 1  <  1 .  Furthermore,  by  examining  Equation  (4.4),  small  innovations  e  (n  )  coupled  with  small  signals 
y  (n  -  1) ,  cause  very  little  adaption  in  filter  tap  weights  (w  ) ;  (as  verified  by  Figures  5-8  and  5-9).  Now 
with  the  filters  practically  fixed  ko(^  )=  ^/,(^  )]  and  the  innovations  e^ik  )  and  )  small,  the 
statistic  T  {k  )  is  practically  reduced  to  [see  Equation  (4.6)] 

(5.8 

and  consequently,  the  cumulative  sum  statistic  reduces  to 

n 

k  =1 

2 

resulting  in  the  linear  behavior  for 

One  approach  to  circumventing  such  behavior  is  to  normalize  the  signal.  Now  the  post-transition 
signal  is  sufficiently  large  to  drive  the  filter  adaption  through  the  innovation  process.  Results  from  the 
normalized  pitch^stable  transition  are  shewn  in  Figures  5-10  to  5-12.  Notice  that  the  transition  is  detected 
near  1330,  and  the  transition  estimation  T ^  is  within  samples  of  the  actual  location. 

The  feature  vector  representing  the  pilch  region  is  computed  by  averaging  the  instantaneous  AR 
coefficients  over  the  second  interval  (^A- ^a) yielding  1/7-  (/,)  =  (0.58,0.32,0. (H,- 0.06).  Having 
only  two  significantly  nonzero  components  lends  to  substantiate  the  AIC  and  MDL  optimal  second  order. 


(5.9) 
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Figure  5-10.  Normalized  pitch-stable  transition  detection  results. 
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Figure  5-11.  Normalized  pitch-stable  global  modeling  results. 
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Figure  5-12.  Normalized  pitch-stable  local  modeling  results. 
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Stable-Roll  Transition:  Results  are  shown  in  Figures  5-13  to  5-15.  The  transition  occurs  at  n  =  400. 
detected  near  500,  and  location  estimated  f  I  within  samples  of  the  actual  occurrence.  Consequently, 

another  feature  vector  representing  the  stable  region  is  computed  over  the  third  interval  similarly 

yielding  approximately  (Z^)  =  0- 
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Figure  5-13.  Stahle-roll  transition  detection  results. 
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Figure  5-14.  Stable -roll  global  modeling  results. 
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Figure  5-15.  Stable -roll  local  modeling  results. 
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Roll-Stable  Transition:  Results  are  shown  in  Figures  5-16  to  5-19.  The  transition  occurs  at  n  =  2319, 
and  the  appropriate  statistic  %  (for  high-  to  low-variance  transition)  fails  to  surpass  threshold.  However, 
for  the  same  reasons  described  in  the  pitch-stable  transition  section,  the  performance  can  be  improved  by 
normalizing  the  signal. 
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Figure  5-16.  Roll-stable  transition  detection  results. 
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Figure  5-17,  Roll -stable  global  modeling  results. 
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Figure  5-18.  Roll-stable  local  modeling  results. 
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Normalized  roll-stable  transition  results  are  shown  in  Figures  5-19  to  5-21.  Notice  that  two 
significant  regions  of  positive  drift  are  exhibited  by  -  ,  although  only  the  last  drift  exceeds  threshold. 

The  latter  coincides  with  the  actual  transition,  yielding  an  estimated  near  the  truth.  The  former  initiated 
at  n  =  1000  appears  to  indicate  a  false  transition.  However,  conversations  with  an  experienced  analyst  [15] 
revealed  that  typical  roll  maneuvers  comprise  three  submaneuvers,  representing  acceleration,  constant  velocity, 
and  deceleration.  Consequently,  the  transition  detector  is  attempting  to  detect  submaneuver  transitions  as  well. 

The  feature  vector  representing  the  roll  maneuver  over  the  interval  is  1/3) 

=  (0. 39, 0.23, 0.  14, 0.06) .  Here  again,  the  relative  magnitudes  of  the  feature  vector  components 
substantiate  the  optimal  AIC  and  MDL  optimal  model  order. 


Figure  5-19.  Normalized  roll-siahle  transition  detection  results 
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Figure  5-20.  Normalized  rolFstable  global  modeling  results. 
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Figure  5-21 .  Normalized  roll-stable  local  modeling  results. 
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Stable-Yaw:  Finally,  these  results  are  shown  in  Figures  5-22  to  5-24.  The  transition  occurs  at  n  =  1502, 
detected  near  1600,  and  again  the  location  estimated  within  samples  of  the  true  transition. 

Feature  vectors  representing  a  stable  region  and  final  yaw  maneuver  are  calculated  over  the  intervals 
(fl-fl)  and  (fl-N  ),  respectively.  Results  are  (Z^)  =  0  and  (Tj)  =  (0. 32,0. 19, 0.  17, 0.21) . 
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Figure  5-22.  Stable-yaw  transition  detection  results. 
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Figure  5-2i.  Stable -yaw  global  modeling  results. 
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Figure  5-24.  Stable-yaw  local  modeling  results. 
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5.4  SUMMARY 


Application  of  the  preprocessing  algorithms  (Section  4),  incorporating  spectral  transition  detection  and 
adaptive  AR  filtering  (feature  vector  extraction),  to  nonstationary  RCS  data  was  reported.  All  filtering  and 
detection  parameters  [including  learning  rates  (r^,  7]^ ),  filter  orders  {M  ^  ,  A/^)  bias  (5  ),  and  a  threshold 
{h  )  ]  were  held  constant  throughout  signal  preprocessing.  Therefore,  the  performance  described  is 
indicative  of  what  can  be  achieved  with  a  fixed  set  of  parameters. 

The  data  were  detrended  by  removing  the  mean  value  and  then  selectively  high-pass  filtered.  Results 
from  detecting  spectral  transitions  (physically  representing  object  maneuvers)  embedded  in  the  detrended 
signal  are  summarized  in  Figure  5-25.  All  transitions  were  detected  within  samples  of  the  actual  locations. 
Detection  performance  was  improved  for  signal  transitions  from  high  to  low  variance  (pitch-stable,  roll- 
stable)  by  normalizing  the  signal. 

The  overall  performance  of  the  adaptive  preprocessor  on  real  nonstationary  data  can  be  illustrated  by 
Figure  5-26.  The  8000-sample  data  were  reduced  to  7  disparate  feature  vectors,  each  of  dimension  4. 
Moreover,  the  amount  of  information  maintained  in  the  feature  vector  representation  can  be  gauged  by  the 
innovations  (modeling  error)  sequence  shown  in  Figure  5-26.  With  the  exception  of  the  roll  maneuver, 
where  the  transition  detector  attempted  to  detect  the  roll  submaneuvers,  the  sequence  is  near  white,  designat¬ 
ing  near-complete  statistical  information  extraction. 
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Figure  5-25.  Summary'  of  transition  detection  results. 
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Figure  5-26.  Adaptive  preprocessor  performance. 
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6.  CONCLUSION 


An  adaptive  preprocessor  was  introduced  to  provide  a  compact  representation  of  nonstationary  data. 
The  representation,  consisting  of  AR  coefficient  vectors  computed  over  near-stationary  segments,  is  scale 
and  translation  invariant,  normalized,  and  statistically  sufficient. 

Guided  by  a  decomposition  theorem,  the  preprocessor  was  constructed  using  adaptive  AR  modeling 
filters  and  a  transition  detector.  The  information-theoretic  transition  detector  driven  by  the  parallel  adaptive 
AR  modeling  filters  successfully  detected  all  transitions  within  the  radar  signature  analyzed,  thereby  yielding 
segments  of  near-stationary  data.  Also,  the  feature  vectors  produced  by  the  modeling  filters  over  the  detected 
near-stationary  segments  were  sufficiently  distinct,  thus  supporting  automatic  classification.  Practically 
all  the  sensor  statistical  data  information  (throughout  8(XX)  samples)  was  retained  in  the  7  compact  feature 
vectors  (each  of  dimension  4)  as  substantiated  by  the  resulting  near-white  innovation  process. 

Regarding  future  research  suggestions,  both  components  of  the  preprocessor  could  be  enhanced.  For 
example,  alternative  recursive  least-squares  algorithms  with  faster  convergence  times  might  be  employed  for 
the  adaptive  AR  modeling  filter,  currently  utilizing  the  LMS  adaption  algorithm.  Also,  new  transition 
detector  statistics  might  be  explored  which  yield  symmetrical  behavior,  thereby  avoiding  the  normalization 
requirement  when  encountering  high-  to  low-variance  transitions.  Or  alternatively,  automatic  gain  control 
circuitry  could  possibly  be  employed  to  automatically  normalize  the  data. 
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APPENDIX  A 
AR  AUTOCORRELATION 


Assuming  the  process  v  (n)  is  stationary,  the  autocorrelation  function  is 


recall 


so  that 


rim)=  £{>’  (n  +  w)  v*(/7)}, 


M 

y(n)=  yin  -  i)  +  e{n) 

/  =1 


(A.l) 


(A.2) 


r(m)=  E 


CM 


*  y  in  +  m  -  i)  +  e(n  +  m) 

V/=l 


v*(«) 


(A.3) 


M 

=  *r  im  -  i)  +  E  {ein  +  w )  y  *(  /?) } 

/=1 

To  evaluate  the  last  term,  consider  the  linear  filter  interpretation  [i.e.,  y  (n)  is  the  output  of  infinite 
impulse  response  (HR)  filter] 


y  ( n) =  hin)  *  ein) 


(A.4) 


=  )e  in  -  k). 

k  =0 


Hence, 


M  f  oo 

rim)=  im  -  i)  +  £Je(/7  +  m)'^hik  )e*in  -  k) 

i=l  ‘  [  k=0 


(A.5) 
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M 

=  (m  -  / )  +  ^h(k  )E  {ein  +  m)e*{n  -  k  )}, 

;•  =1  '  k  =0 


and  under  the  white  noise  residual  assumption 


E  {e{n  +  m)e*(n  -  k  )}  = 


k  =  -  m  _ 
0  k  m  ' 


(A.6) 


consequently. 


M 

r(m)  =  “  ,  m  =  0, 1,2,... 


(A.7) 
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Finally,  substituting  the  lags  m  =  1,  2,  ...  A/  into  the  autocorrelation  expression  yields  the  linear  matrix 
equation  for  the  AR  coefficients 


/■(I)  =  M’*/'  (0)  +  w*r  (—  1)  +  ...  +  /  (I  —  M  ) 

r(2)  =  w*r  (1)  +  w*r  (0)  +  ...+  w*  r(2-  M  ) 

1  2  M 

r{M)=  w*r{M  -  1)  +  w*r{M  -  2)  +  ...+  w*  r(0) 

1  2  M 
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Upon  defining  the  correlation  matrix 


/?  =  £  {  f(/7  -  1)  V^(/2  -  1)} 


=  £ 


f  yin -1)  ^ 
yin -2) 


yin  -  M ) 


(y  in  -  1)  y  in  -  2)  ...  yin  -  M  )) 


riO) 
ri-  1) 


ri\) 

riO) 


[ri-  M  +\)  ri-  M  +2) 


riM  -  1)^ 
riM  -2) 

riO) 


and 

^/■(-  M  )> 


the  AR  coefficients  are  seen  to  satisfy 

-T_*  _  -H  _  _ 

R  w  =  /■  *  or  R  w  =  r 

Rw  =  T 


since  the  correlation  matrix  R  is  Hermitian. 


(A.9) 


(A.IO) 


(A.ll) 
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APPENDIX  B 

CONDITIONAL  DISTRIBUTION  OF  PARTITIONED 
GAUSSIAN  RANDOM  VECTOR 

Let  a  Gaussian  random  vector  be  partitioned  as 


having  a  joint  distribution 


(B.l) 


J 


(B.2) 


V  = 


r 


y  1 


V  '  J 


(B.3) 


p(z  )  = 


-D  r  (r-D, 


n  Ha- r 
(2n)  IT  I 


(B.4) 


and  a  partitioned  covariance  matrix  given  by 


where 


I  =  £{(F  -  r)(r  -  rf  ] 

Lxy  ^ 

\Iy.x 

V  J 

T=E{T}, 


(B.5) 


(B.6) 


The  aim  is  to  express  the  conditional  distribution  p  (a*  ly  )  in  terms  of  the  partitioned  covariances 
Lxx  ,  Lxy  ,  Zyx  ,  and  Zyy  .  Such  a  task  is  easily  accomplished  if  X  can  be  massaged  into  matrix  diagonal 
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form.  That  is,  a  transformation  on  r  is  sought  to  simplify  the  exponential  argument  in  Equation  (B.4) 
yielding 


(z  -  I  -  r)=  (s  -  D 


rq,  o^■‘ 


=  (^  -  r) 


H 


f--\  -  \ 

^11  0 

0  ^2‘2y 


(r  -  r) 


(^1  -  -  ^l)  +  (^2-  ^'2) 


(B.7) 


where  s  is  the  appropriately  partitioned  vector 


s  = 


{ s  ^ 
\^2J 


(B.8) 


The  following  identity  is  used  to  provide  the  partitioned  diagonalization.  For  X  positive  definite  and 
Zv.v  square,  let 


B  = 


f  _ 

I 

^0 


-  --1  ^ 

-XX 

xy  yy 

T  ) 


(B.9) 


then 


_  _ H 

C  =BIB  = 


0  ' 

(C 

11 

0  >1 

0 

lo 

22^ 

(B.IO) 


and  is  easily  verified  upon  substitution. 
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Now,  I  ^  can  be  computed  in  partitioned  form 


-1  -1 

_-i  /  .//  --1/  Ji  \  _// 

I  =[B  CB)  =B  C  \B  )  =B  C  fi=B 


f --\  -  \ 

_-i 

0 

22 


B.  (B.ll) 


Also,  the  determinant  can  be  computed  by 

Irh 


-H  -  - 

_// 

=  Ic  lie  1 

B  CB 

= 

B 

C 

B 

\  1 1  r  22 

{B.12) 


Consequently,  using  Equations  (B.l  1)  and  (B.12).  the  joint  distribution  [Equation  (B.4)]  becomes 

pin  =  1 


-4(---r)s^<r  fiCr-r) 
-e  2 


and  is  written  compactly  utilizing  the  coordinate  transformation 


s  =  B:  = 


so  that 


f  7  -  y  y  7\ 
•'  ■^.v  v  "^  vy  ' 

f'jl 

<  V  ) 

pis  )  = 


-Ui  -s' )  C  is  -s' ) 


-L  -L 

n  r  I  —  I  " 

\^22\ 


(B.I3) 


(B.14) 


(B.15) 
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Finally,  upon  substituting  Equation  (B.IO)  into  (B.16),  the  joint  distribution  in  partitioned  form 
becomes 


p{x<y)  = 


1  -  _  _-l  - 

1 

p  -  X  ,  x,„x  , 

1-^  , 

f-xx  ^xy^yy^yx 

(B.16) 


^  )]  (r„-r,rX) 

•  e  ^  j 


while  the  conditional  distribution  in  partitioned  foim  becomes 

Pix/y)=  p{I,y-)  1 


piy) 


nJA- 

(2;r)  ]r,, - 


I  I 

AV  VV  VA 


(B.17) 


•  e 


M  -I 

-4[.r-(r+r^  r;'(v-r))]  [.r-(jvr,^. r:;( --d)] 
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APPENDIX  C 

DERIVATION  OF  TEST  STATISTIC  FOR  AR  GAUSSIAN  PROCESS 

The  test  statistic  given  by 


u{n)= 

k=l 


(C.l) 


where 


T  (k)  =  log  pj^  y  ( k  )/  y  {k  -  1))  -  |  pj^  y  ( k  )/y  ( 
Pfji.  yik)l  y{k  -  D) 


k  -  l))log  Pa(  y(^)/y(A-  -  \))dy 

Pi^iy(k)/y(k  -oy 


(C.2) 


is  derived  for  the  zero  mean  AR  Gaussian  process  v  ( /2 ) 

V  ( /? )  =  vv  y{n  -  1)  +  e{n) 


{C3) 


where  e{n  )  is  a  white  Gaussian  noise  process  with  variance  .  First,  the  conditional  distribution  needed 
for  deriving  the  test  statistic  Equation  (C.2)  is  obtained  by  defining  the  {M  +  1)  x  1  partitioned  random 
vector 


V 


VV 


(  y(^)  ^ 

K.yik  -  1); 


with  appropriate  partitioned  covariance  matrices 


^UU  =  E{y{k)y{kf  }=  r(0) 

Ku  =  ^{y  -  i)y  (^•)^  }=  E 


(C.4) 


(C.5) 

(C.6) 

(C.7) 
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=  E  {y  ik  -  \)  y(  k  -  l)^  }  =  ^  • 

Next,  the  results  of  Appendix  B  are  applied  (substituting  u  =  x'  ,\~  =  f  )  yielding 


(C.8) 


p(y(k)/y(k  -  1))  =  (r(O)-r" <■  >)  (, 


y{k)-r^  R  y“(A--l)] 

(C.9) 


»- 

where 


k'  =  (2;rL-(0)  -  R  *F|) 


-1/2 


(CIO) 


Employing  the  AR  modeling  assumption,  (i.e.,  Rw  °=  r  and  g}  =  r  (0)  -  w  r  ),  the 
distribution  reduces  to 


’ik) 


.2 


p(y{k)ly{k  -  1))  =  Ke 


2a! 


(C.ll) 


where 


and 


K  =  (2;ra;) 


•{k)  -  y{k)  -  VC  yik  -  1) 


(C.12) 


(C.13) 


represents  the  AR  modeling  error  (innovation)  process  [Equation  (2.9)].  Therefore,  the  steady  state  test 
statistic  [Equation  (C.2)]  for  the  AR  Gaussian  process  [Equation  (C.3)]  is 


r(;-)  =  iog  -jf-- 


(C.14) 
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where 


( K\ 


log 


Kh) 


-  ^  J  )¥y  +  ^  1 1^0'  f  Pi^ai  y)¥y 


C  — oo 


^cLy)  =  y  ~  y^^  ■ 

e^{y)  =  y  -  y(^  -  D 

Pi^aiy))  =  ^  (^’^J- 

Evaluating  the  first  integral, 

oo 

^1=  i  Jkfl(v)|'p(^a(>’)Kv ; 

^  — oo 

a 

with  a  change  of  variables 


Evaluating  the  second  integral. 


^2  =  -^\\^hiy)\~Pi^aiy))dy 


(C.15) 

(C.16) 

(C.17) 

(C.18) 


(C.19) 

(C.20) 
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+  [^fc(.v)-  >)]<(>’ )  +  K(>’)l^ 


\ 

p{uy)¥y 


2cyi 


e^ik)  -  ej^k),  +(^^(A:)-  ej.k))  E 


Combining  the  results.  Equation  (C.14)  becomes 


T  {k)  =  \ 


(C.21) 
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